Genomic markers of hepatitis B virus associated with hepatocellular carcinomas

ABSTRACT

The present invention provides methods of predicting a pre-disposition of HBV-infected individuals to develop hepatacellular carcinoma (HCC).

BACKGROUND OF THE INVENTION

Hepatitis B virus (HBV) infects over 300 million people worldwide. Forthose individuals with high levels of viral replication, chronic activehepatitis with progression to cirrhosis, liver failure andhepatocellular carcinoma (HCC) is common.

The natural progression of chronic HBV infection over a 10 to 20 yearperiod leads to cirrhosis in 20-to-50% of patients and progression ofHBV infection to hepatocellular carcinoma has been well documented.There have been no studies that have determined sub-populations ofhepatitis B virus that are most likely to cause hepatocellularcarcinoma, thus to date all hepatitis B virus have been considered ofequal risk of hepatocarcarcinogesis.

It is important to note that the survival for patients diagnosed withhepatocellular carcinoma is only 0.9 to 12.8 months from initialdiagnosis (Takahashi et al., American Journal of Gastroenterology88:240-243 (1993)). Treatment of hepatocellular carcinoma withchemotherapeutic agents has not proven effective and only 10% ofpatients will benefit from surgery due to extensive tumor invasion ofthe liver (Trinchet et al., Presse Medicine 23:831-833 (1994)). Giventhe aggressive nature of primary hepatocellular carcinoma, the onlyviable treatment alternative to surgery is liver transplantation(Pichlmayr et al., Hepatology 20:33S-40S (1994)).

BRIEF SUMMARY OF THE INVENTION

The present invention provides for methods of determining apre-disposition of an individual infected with hepatitis B virus (HBV)to develop hepatocellular carcinoma (HCC). In some embodiments, themethods comprise:

-   -   (a) determining nucleotides in the genome of HBV isolated from        the individual at positions corresponding to nucleotides 31, 53,        799, 1165, 1499, 1762, 1764, 2170, 2441, 2525, and/or 2712 of        SEQ ID NO: 1; and    -   (b) comparing the determined nucleotides to nucleotides        associated with a pre-disposition to cause HCC, wherein the        nucleotides associated with a pre-disposition to cause HCC        comprise: 31C, 53C, 799G, 1165T, 1499G, 1762T, 1764A, 2170C,        2170G, 2441C, 2525C, 2712C, 2712A, and/or 2712G.

In some embodiments, the methods comprise:

-   (a) determining nucleotides in the genome of a genotype B HBV    isolated from the individual at positions corresponding to    nucleotides 1165, 1762, 1764, 2525 or 2712 of SEQ ID NO: 1; and-   (b) comparing the determined nucleotides to nucleotides associated    with a pre-disposition to cause HCC, wherein the nucleotides    associated with a pre-disposition to cause HCC comprise: 1165T,    1762T, 1764A, 2525C, 2712C, 2712A, or 2712G.

In some embodiments, the methods comprise:

-   -   (a) determining nucleotides in the genome of a genotype B HBV        isolated from the individual at positions corresponding to        nucleotides 1165, 1762, 1764, 2525 and 2712 of SEQ ID NO: 1; and    -   (b) comparing the determined nucleotides to nucleotides        associated with a pre-disposition to cause HCC, wherein the        nucleotides associated with a pre-disposition to cause HCC in        genotype B comprise:        -   1762T and 1764A and 2712A; or        -   1762T and 1764A and 2712C; or        -   1762T and 1764A and 2712G; or        -   1762T and 1764A and 2712T and 2525C; or        -   1762A and 1764G and 1165T.

In some embodiments, the method comprises determining the genotype ofthe HBV from the individual.

In some embodiments, the determining step comprises nucleotidesequencing the HBV genome flanking the nucleotides at positionscorresponding to nucleotides 1165, 1762, 1764, 2525 and 2712 of SEQ IDNO:1.

In some embodiments, the determining step comprises amplifying at leasta portion of the HBV genome to produce one or more amplificationproducts comprising the nucleotides at the positions corresponding tonucleotides 1165, 1762, 1764, 2525 and 2712 of SEQ ID NO: 1. In someembodiments, the method comprises contacting the one or moreamplification products with one or more probes that hybridize toHCC-associated nucleotides:

-   -   1762T and 1764A and 2712A; or    -   1762T and 1764A and 2712C or;    -   1762T and 1764A and 2712G; or    -   1762T and 1764A and 2712T and 2525C; or    -   1762A and 1764G and 1165T;        under conditions to allow for hybridization of a probe to an        amplification product only if the amplification product        comprises a complementary nucleotide at the position of the        HCC-associated nucleotide. In some embodiments, the        hybridization is performed as a line probe assay.

In some embodiments, the method comprises:

-   -   (a) determining nucleotides in the genome of a genotype C HBV        isolated from the individual at positions corresponding to        nucleotides 31, 53, 799, 1499, 2170, or 2441; and    -   (b) comparing the determined nucleotides to nucleotides        associated with a pre-disposition to cause HCC, wherein the        nucleotides associated with a pre-disposition to cause HCC        comprise: 31C, 53C, 799G, 1499G, 2170C, 2170G, or 2441C.

In some embodiments, the method comprises

-   -   (a) determining the subtype of a genotype C HBV from the        individual, wherein:        -   subtype C1 comprises nucleotides 2783G and 2733A,        -   subtype C2 comprises nucleotides 2783G, 2733C and 3033A, and        -   subtype C3 comprises 2783G, 2733C and 3033C;    -   b1) if the HBV is genotype C1, determining the nucleotides at        positions corresponding to nucleotides 31, 53 and 1499 of SEQ ID        NO: 1; or    -   b2) if the HBV is genotype C2, determining the nucleotides at        positions corresponding to nucleotides 799, 2441 and 2170 of SEQ        ID NO: 1; and    -   c) comparing the determined nucleotides to nucleotides at the        positions associated with a pre-disposition to cause HCC,        wherein the nucleotides associated with a pre-disposition to        cause HCC in subtype C1 comprise:        -   31C; and/or        -   53C; and/or        -   1499G; and            the nucleotides associated with a pre-disposition to cause            HCC in subtype C2 comprise:2170C; and/or    -   2170G; and/or    -   2441C; and/or    -   799G.

In some embodiments, the determining step comprises nucleotidesequencing the HBV genome flanking the nucleotides at positionscorresponding to nucleotides 31, 53, and 1499 of SEQ ID NO: 1. In someembodiments, the determining step comprises nucleotide sequencing theHBV genome flanking the nucleotides at positions corresponding tonucleotides 799, 2441, and 2170 of SEQ ID NO: 1. In some embodiments,the determining step comprises amplifying at least a portion of the HBVgenome to produce one or more amplification products comprising thenucleotides at the positions corresponding to nucleotides 31, 53, and1499 of SEQ ID NO:1.

In some embodiments, the determining step comprises amplifying at leasta portion of the HBV genome to produce one or more amplificationproducts comprising the nucleotides at the positions corresponding tonucleotides 799, 2441, and 2170 of SEQ ID NO:1.

In some embodiments, the method comprises contacting the one or moreamplification products with one or more probes that hybridize toHCC-associated nucleotides:

-   -   31C; and/or    -   53C; and/or    -   1499G;        under conditions to allow for hybridization of a probe to an        amplification product only if the amplification product        comprises a complementary nucleotide at the position of the        HCC-associated nucleotide.

In some embodiments, the hybridization is performed as a line probeassay.

In some embodiments, the method comprises contacting the one or moreamplification products with probes that hybridize to HCC-associatednucleotides:

-   -   2170G; and/or    -   2441C; and/or    -   799G;        under conditions to allow for hybridization of the probes to the        amplification product only if the amplification product        comprises a complementary nucleotide at the position of the        HCC-associated nucleotide.

In some embodiments, the hybridization is performed as a line assay.

In some embodiments, the method further comprises determining thegenotype of the HBV from the individual.

In some embodiments, the method comprises:

-   -   determining the genotype of the HBV, wherein genotype B        comprises 2783A, wherein genotype C1 comprises 2783G and 2733A,        genotype C2 comprises 2783G, 2733C and 3033A and genotype C3        comprises 2783G, 2733C and 3033C;    -   determining nucleotides 1165, 1762, 1764, 2525 and 2712 of the        HBV genome if the HBV is genotype B; and/or    -   determining nucleotides 31 and/or 53 and/or 1499 of the HBV        genome if the HBV is C1; and/or    -   determining nucleotides 2170 and/or 2441 and/or 799 of the HBV        genome if the HBV is C2; and    -   comparing the determined nucleotides to nucleotides associated        with a pre-disposition to cause HCC,    -   wherein nucleotides associated with a pre-disposition to cause        HCC in genotype B comprise:        -   1762T and 1764A and 2712A; or        -   1762T and 1764A and 2712C or;        -   1762T and 1764A and 2712G; or        -   1762T and 1764A and 2712T and 2525C; or        -   1762A and 1764G and 1165T;    -   wherein nucleotides associated with a pre-disposition to cause        HCC in genotype C1 comprise:        -   31C; and/or        -   53C; and/or        -   1499G; and    -   wherein nucleotides associated with a pre-disposition to cause        HCC in genotype C2 comprise:        -   2170C; and/or        -   2170G; and/or        -   2441C; and/or        -   799G;    -   thereby determining the pre-disposition of the individual to        develop HCC.

The present invention also provides kits for detecting HBV isolates thatare associated with the development hepatocellular carcinoma (HCC).

In some embodiments, the kits comprise:

one or more probe which, when contacted to an HBV genome, selectivelyhybridizes to the genome if the genome comprises at least one of thefollowing nucleotides: 31C, 53C, 799G, 1165T, 1499G, 1762T, 1762A,1764A, 1764G, 2441C, 2170C, 2170G, 2712A; 2712C, 2712G; or 2525C.

In some embodiments, the probe is linked to a solid support.

In some embodiments, the probe selectively hybridizes to:

-   -   1762T and 1764A and 2712A; and/or    -   1762T and 1764A and 2712C; and/or;    -   1762T and 1764A and 2712G; and/or    -   1762T and 1764A and 2712T and 2525C; and/or    -   1762A and 1764G and 1165T.

In some embodiments, the probe selectively hybridizes to:

-   -   31C; and/or    -   53C; and/or    -   1499G.

In some embodiments, the probe selectively hybridizes to:

-   -   2170C; and/or    -   2170G; and/or    -   2441C; and/or    -   799G.

In some embodiments, the kits further comprise primers for amplificationof at least a portion of the HBV genome.

The present invention also provides a computer readable medium fordetermining whether an HBV sequence is likely to result in thedevelopment of HCC. In some embodiments, the computer readable formcomprises:

-   -   a) code for receiving information describing: nucleotides at        positions corresponding to nucleotides 31, 53, 799, 1165, 1499,        1762, 1764, 2170, 2441, 2525, or 2712 of SEQ ID NO:1;    -   b) code for comparing the nucleotides received in a) to        nucleotides associated with a pre-disposition to cause HCC; and    -   c) code for providing a determination of the pre-disposition of        the HBV to cause HCC,        wherein nucleotides associated with a pre-disposition to cause        HCC comprise: 31C, 53C, 799G, 1165T, 1499G, 1762T, 1764A, 2170C,        2170G, 2441C, 2525C, 2712C, 2712A, or 2712G.

Definitions

A probe “selectively hybridizes” to a viral genome comprising aparticular nucleotide when the probe hybridizes to the genome when theparticular nucleotide (at the specified position) is present, but doesnot hybridize if the nucleotide at the specified position is differentor absent. Conditions to allow for hybridization of a probe to aparticular DNA molecule only if a complementary nucleotide is present ina particular target DNA are generally “stringent hybridizationconditions.”

The phrase “stringent hybridization conditions” refers to conditionsunder which a probe will hybridize to its target subsequence, typicallyin a complex mixture of nucleic acid, but to no other sequences, or atleast to no other sequences at which a particular position is anythingbut one particular nucleotide. Stringent conditions aresequence-dependent and will be different in different circumstances.Longer sequences hybridize specifically at higher temperatures. Anextensive guide to the hybridization of nucleic acids is found inTijssen, Techniques in Biochemistry and Molecular Biology—Hybridizationwith Nucleic Probes, “Overview of principles of hybridization and thestrategy of nucleic acid assays” (1993). Generally, stringent conditionsare selected to be about 5-10° C. lower than the thermal melting point(T_(m)) for the specific sequence at a defined ionic strength pH. TheT_(m) is the temperature (under defined ionic strength, pH, and nucleicconcentration) at which 50% of the probes complementary to the targethybridize to the target sequence at equilibrium (as the target sequencesare present in excess, at T_(m), 50% of the probes are occupied atequilibrium). Stringent conditions for Southern hybridization aregenerally those in which the salt concentration is less than about 1.0 Msodium ion, typically about 0.01 to 1.0 M sodium ion concentration (orother salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C. for short probes (e.g., 10 to 50 nucleotides) and at least about 60°C. for long probes (e.g., greater than 50 nucleotides). Stringentconditions may also be achieved with the addition of destabilizingagents such as formamide. For selective hybridization, a positive signalis at least two times background, optionally 10 times backgroundhybridization, i.e., hybridization to another nucleotide sequence with adifferent nucleotide at the position of interest. Exemplary stringenthybridization conditions can be as follows: 50% formamide, 5×SSC, and 1%SDS, incubating at 42° C., or 5×SSC, 1% SDS, incubating at 65° C., withwash in 0.2×SSC, and 0.1% SDS at 65° C. Such washes can be performed for5, 15, 30, 60, 120, or more minutes.

“Determining nucleotides in the genome of HBV at positions correspondingto” particular nucleotides of a reference sequence (e.g., SEQ ID NO: 1)refers to identifying a position in an isolated HBV genome that occursin a position that is the equivalent of the particular position in thereference sequence. The variants identified in the present invention arenot limited to predicting sequence pre-disposition of variants of SEQ IDNO: 1, but instead apply to any HBV strain carrying particularcorresponding nucleotides. Thus, when the genome of an HBV isolatediffers from SEQ ID NO: 1 (e.g., by changes in nucleotides or additionor deletion of nucleotides), it may be that a particular nucleotideassociated with the development of HCC will not be in exactly the sameposition as it is in SEQ ID NO: 1. For example, the nucleotidecorresponding to nucleotide 31C of SEQ ID NO: 1 may occur at position 32of a particular HBV strain due to a one nucleotide insertion at anearlier position in the strain's genome. Nevertheless, position 32 ofthe HBV strain would correspond to position 31 of SEQ ID NO: 1, whichcan be readily illustrated in an alignment of the two sequences. Asdescribed herein, the corresponding nucleotide in the genome of an HBVisolate can be determined using an alignment algorithm such as BLAST.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates the locations of various primers used foramplification of HBV and the resulting amplified fragments relative tothe HBV genome, represented as a line at the bottom of the figure.

FIGS. 2A and 2B illustrate the genome (SEQ ID NO:1) of an exemplary HBVgenotype B late comprising highlighted nucleotides associated with thedevelopment of HCC.

FIGS. 3A and 3B illustrate the genome (SEQ ID NO:2) of an exemplary HBVgenotype C1 isolate comprising highlighted nucleotides associated withthe development of HCC.

FIGS. 4A and 4B illustrate the genome (SEQ ID NO:3) of an exemplary HBVgenotype C2 isolate comprising highlighted nucleotides associated withthe development of HCC.

DETAILED DESCRIPTION OF THE INVENTION

I. Introduction

The present invention is based on the discovery that certain sequencevariants of HBV are associated with the development of hepatocellularcarcinoma (HCC) in individuals infected with HBV. Specifically, thepresence of the following nucleotides in an HBV genome is associatedwith the development of HCC: 31C, 53C, 799G, 1165T, 1499G, 1762T, 1764A,2170C, 2170G, 2441C, 2525C, 2712C, 2712A, or 2712G. Accordingly, theinvention provides for methods of determining whether an individualinfected with HBV has a predisposition for HCC by detecting thenucleotide sequence of the HBV variant infecting the individual. Themethod also provides for kits comprising reagents to detect any of thespecific variants associated with HCC and computer readable forms forapplying the methods of the invention.

II. Detecting HBV Variants Associated with HCC

Any number of methods may be used to determine the nucleotides at thepositions corresponding to nucleotides at positions 31, 53, 799, 1165,1499, 1762, 1764, 2170, 2441, 2525, or 2712 of SEQ ID NO: 1.

In some embodiments, nucleotide sequencing is used to determine thenucleotides at particular positions of the HBV genome. Without intendingto limit the invention, examples of nucleotide sequencing include chaintermination sequencing. See, e.g., Sanger et al. Proc. Nat. Acad. Sci.USA 74:5463-5467 (1977); Sambrook et al., Molecular Cloning, ALaboratory Manual (2nd ed. 1989); Kriegler, Gene Transfer andExpression: A Laboratory Manual (1990); and Current Protocols inMolecular Biology (Ausubel et al., eds., 1994)). Sequencing may beperformed following amplification of the HBV genome or a fragmentthereof. Direct sequencing of PCR generated amplicons by selectivelyincorporating boronated nuclease resistant nucleotides into theamplicons during PCR and digestion of the amplicons with a nuclease toproduce sized template fragments may also be performed (Porter et al.,Nucleic Acids Research 25(8):1611-1617 (1997)). Alternatively,microfluidic techniques such as those described in U.S. PatentPublication No. 2003/0215862 may be used. See also U.S. PatentPublication No. 2003/0152996 describing alternate sequencing methods.

Specific probes that bind to nucleotides at particular positions in theHBV genome may also be used to detect nucleotides in the HBV genome.Probes that detect the particular nucleotides associated with HCC may beused in a reverse hybridization assay format using immobilizedoligonucleotide probes present at distinct locations on a solid support.More particularly, the Line Probe Assay (LiPA) may be used. The LiPA isa reverse hybridization assay using oligonucleotide probes immobilizedas parallel lines on a solid support strip. See, e.g., PCT PublicationNo. WO 94/12670. In this assay, specific oligonucleotides may beimmobilized at known locations on membrane strips and hybridized understrictly controlled conditions with the labeled PCR product. Differentprobes may be designed such that each probe on the strip comprises anHBV nucleotide sequence, or complement thereof, but contains a differentnucleotide at a particular position. Amplifying an HBV genome, orfragment thereof, and hybridizing the amplification product to one ormore probes specific for a particular variant will result in complete orat least preferential hybridization of one of the probes to the product,thereby indicating which nucleotide at the particular position iscontained in the amplified genome. Hybridization conditions using thisassay are generally set at a high stringency such that only one probebinds to the amplification product. Exemplary conditions may include,e.g., standard hybridization and washing conditions (e.g., 1×SSC buffercontaining 0.1% sodium dodecyl sulfate at 62° C.).

Amplification of HBV

The HBV genome or a portion thereof may be amplified before thenucleotides at positions associated with HCC are determined. An“amplification” refers to any chemical, including enzymatic, reactionthat results in increased copies of a template nucleic acid sequence.Amplification reactions include polymerase chain reaction (PCR) andligase chain reaction (LCR) (see U.S. Pat. Nos. 4,683,195 and 4,683,202;PCR Protocols: A Guide to Methods and Applications (Innis et al., eds,1990)), strand displacement amplification (SDA) (Walker, et al. NucleicAcids Res. 20(7):1691-6 (1992); Walker PCR Methods Appl 3(1):1-6(1993)), transcription-mediated amplification (Phyffer, et al., J. Clin.Microbiol. 34:834-841 (1996); Vuorinen, et al., J. Clin. Microbiol.33:1856-1859 (1995)), nucleic acid sequence-based amplification (NASBA)(Compton, Nature 350(6313):91-2 (1991), rolling circle amplification(RCA) (Lisby, Mol. Biotechnol. 12(1):75-99 (1999)); Hatch et al., Genet.Anal. 15(2):35-40 (1999)) and branched DNA signal amplification (bDNA)(see, e.g., Iqbal et al., Mol. Cell Probes 13(4):315-320 (1999)).

Amplified portions of the HBV genome (optionally labeled) may behybridized to DNA comprising one or more HCC-associated nucleotides, ora complement thereof, thereby allowing for determination of the identityof nucleotides at a nucleotide position of interest. Alternatively, theprobes may detect non-HCC-associated nucleotides, thereby allowing fordetection of HCC-associated HBV variants by detecting a lack ofhybridization.

In some embodiments, the amplified fragment of the genome will comprisemore than one HCC-associated nucleotide. Thus, in some embodiments, thefragment will comprise any combination of positions corresponding tonucleotides at positions 31, 53, 799, 1165, 1499, 1762, 1764, 2170,2441, 2525, and/or 2712 of SEQ ID NO: 1. In some embodiments, thefragment will comprise positions corresponding to nucleotides 1165,1762, 1764, 2525 and 2712 of SEQ ID NO:1. In some embodiments, thefragment will comprise positions corresponding to nucleotides 31, 53,799, 1499, 2170, and 2441 of SEQ ID NO: 1.

In some cases, more than one fragment of HBV is amplified. In thesecases, the sum of all fragments amplified may comprise any combinationof positions corresponding to nucleotides at positions 31, 53, 799,1165, 1499, 1762, 1764, 2170, 2441, 2525, or 2712 of SEQ ID NO:1. Forexample, one fragment may comprise positions 31, 53, 799, 1165, 1499,1762, 1764 and a second fragment may comprise positions 2170, 2441,2525, or 2712. In some embodiments, the sum of all amplified fragmentswill comprise positions corresponding to nucleotides 1165, 1762, 1764,2525 and 2712 an SEQ ID NO:1. In some embodiments, the sum of allamplified fragments will comprise positions corresponding to nucleotides31, 53, 799, 1499, 2170, and 2441.

In some embodiments, amplification and detection methods are used incombination, and sometimes in the same reaction vessel, to detect HBVpolynucleotides using detectably-labeled probes that distinguish betweenHCC-associated nucleotides and nucleotides not associated with HCC.Binding of a probe to its complementary hybridization sequence allowsthe user to quantify the accumulation of a particular sequence withoutnecessarily removing the contents from the reaction vessel. In general,any type of label that allows for the detection and differentiation ofdifferent probes can be used according to the methods of the invention.

Accumulation of amplified product can be quantified by any method knownto those in the art. For instance, fluorescence from a probe can bedetected by measurement of light at a particular frequency. Similarly,the accumulation of various chemical products created via an enzymaticreaction linked to the probe can be measured, for instance, by measuringabsorbance of light at a particular wavelength. In other embodiments,amplification reactions can be quantified directly by blotting them ontoa solid support and hybridizing with a detectably-labeled nucleic acidprobe. Once unbound probe is washed away, the amount of probe can bequantified by measuring radioactivity as is known to those of skill inthe art. Other variations of this technique employ the use ofchemiluminescence to detect hybridization events.

Measurement of amplification products can be performed after thereaction has been completed or can be measured in “real time” (i.e., asthe reaction occurs). If measurement of accumulated amplified product isperformed after amplification is complete, then detection reagents (e.g.probes) can be added after the amplification reaction. Alternatively,probes can be added to the reaction prior or during the amplificationreaction, thus allowing for measurement of the amplified products eitherafter completion of amplification or in real time. Real timemeasurements can be particularly useful because they allow formeasurement at any given cycle of the reaction and thus provide moreinformation about accumulation of products throughout the reaction. Formeasurement of amplification product in real time, fluorescent probesare often used.

One amplification assay utilizing a FRET pair to detect an amplificationproduct is the “TaqMan®” assay described in Gelfand et al. U.S. Pat. No.5,210,015, and Livak et al. U.S. Pat. No. 5,538,848. The probe is asingle-stranded oligonucleotide labeled with a FRET pair. In a TaqMan®assay, a DNA polymerase releases single or multiple nucleotides bycleavage of the oligonucleotide probe when it is hybridized to a targetstrand. That release provides a way to separate the quencher label andthe fluorophore label of the FRET pair.

Another type of nucleic acid hybridization probe assay utilizing FRETpairs is described in Tyagi et al U.S. Pat. No. 5,925,517, whichutilizes labeled oligonucleotide probes, which are referred to as“molecular beacons.” See Tyagi, S. and Kramer, F. R., NatureBiotechnology 14: 303-308 (1996). A molecular beacon probe is anoligonucleotide whose end regions hybridize with one another in theabsence of target but are separated if the central portion of the probehybridizes to its target sequence. The rigidity of the probe-targethybrid precludes the simultaneous existence of both the probe-targethybrid and the intramolecular hybrid formed by the end regions.Consequently, the probe undergoes a conformational change in which thesmaller hybrid formed by the end regions disassociates, and the endregions are separated from each other by the rigid probe-target hybrid.For molecular beacon probes, a central target-recognition sequence isflanked by arms that hybridize to one another when the probe is nothybridized to a target strand, forming a “hairpin” structure, in whichthe target-recognition sequence (which is commonly referred to as the“probe sequence”) is in the single-stranded loop of the hairpinstructure, and the arm sequences form a double-stranded stem hybrid.When the probe hybridizes to a target, that is, when thetarget-recognition sequence hybridizes to a complementary targetsequence, a relatively rigid helix is formed, causing the stem hybrid tounwind and forcing the arms apart.

One of skill will recognize that a large number of differentfluorophores can be used to label probes useful in the invention. Somefluorophores useful in the methods and composition of the inventioninclude: fluorescein, fluorescein isothiocyanate (FITC), carboxytetrachloro fluorescein (TET), NHS-fluorescein, 5 and/or 6-carboxyfluorescein (FAM), 5- (or 6-) iodoacetamidofluorescein, 5-{[2(and3)-5-(Acetylmercapto)-succinyl]amino}fluorescein (SAMSA-fluorescein),and other fluorscein derivatives, rhodamine, Lissamine rhodamine Bsulfonyl chloride, Texas red sulfonyl chloride, 5 and/or 6 carboxyrhodamine (ROX) and other rhodamine derivatives, coumarin,7-amino-methyl-coumarin, 7-Amino-4-methylcoumarin-3-acetic acid (AMCA),and other coumarin derivatives, BODIPY™ fluorophores, Cascade Blue™fluorophores such as 8-methoxypyrene-1,3,6-trisulfonic acid trisodiumsalt, Lucifer yellow fluorophores such as3,6-Disulfonate-4-amino-naphthalimide, phycobiliproteins derivatives,Alexa fluor dyes (available from Molecular Probes, Eugene, Oreg.) andother fluorophores known to those of skill in the art. For a generallisting of useful fluorophores, see Hermanson, G. T., BIOCONJUGATETECHNIQUES (Academic Press, San Diego, 1996). Thus, each probe used in areaction may fluoresce at a different wavelength and can be individuallydetected without interference from the other probes. This is useful, forexample, if probes that detect different nucleotides at a particularposition are used in a reaction. Thus, for example, one wavelength mayindicate binding of a probe that detects 31T while a probe comprising alabel with a different wavelength will detect 31C.

Preparing HBV from a Test Sample

The presence or amount of HBV nucleic acids in a test sample can bedetermined by amplifying the target regions within the HBV gene. Thus,any liquid or solid material believed to comprise HBV nucleic acids canbe an appropriate sample. Preferred sample tissues include plasma,serum, whole blood, blood cells, lymphatic fluid, cerebral spinal fluid,synovial fluid and others.

As used herein, the term “test sample” refers to any liquid or solidmaterial believed to comprise HBV nucleic acids. A test sample may beobtained from a biological source, such as cells in culture or a tissuesample from an animal, e.g., a human. Sample tissues of the instantinvention may include, but are not limited to, plasma, serum, wholeblood, blood cells, lymphatic fluid, cerebrospinal fluid, synovialfluid, urine, saliva, and skin or other organs (e.g. liver biopsymaterial).

Such sample will often be taken from patients suspected of having HBVinfection, or having any of the wide spectrum of liver diseases relatedto HBV infection.

Nucleic acids representing the HBV gene of interest may be extractedfrom tissue samples. Various commercial nucleic acid purification kits,such as QIAmp 96 Virus BioRobot Kit and Qiagen's BioRobot 9604 are knownto the skilled artisan, and used to isolate HBV nucleic acids fromsamples.

III. Determination of HBV Genotype

The present methods may also involve a determination of the genotype ofHBV in an individual. For example, particular nucleotide variantsidentified herein may have a stronger predisposition to cause HCC if thevariants are found in one genotype than in another. In this context,“genotype” refers to the at least 8 genotypes of HBV (genotypes A, B, C,D, E, F, G, and H) deduced from genome comparisons and designatedgenotypes A to H. See, e.g., Westland C. Hepatology 36: 2-8 (2002);Borchani-Chabchoub I, et al., Microbes Infect 2: 607-12 (2000);Grandjacques C, et al., J Hepatol 33: 430-9 (2000); Kato H, et al., JVirol Methods 98: 153-9 (2001); Ashton-Rickardt P G, et al., J Med Virol29: 204-14 (1989). Thus, by detecting nucleotides at particularpositions identified to occur only in a specific genotype, one maydetermine the genotype of HBV. Of course, other methods such asserological methods may also be used.

In some embodiments, the presence or absence of the B or C genotype ofHBV will be determined. Further, the subtype of genotype may also bedetermined. The B, C1, C2, or C3 genotype of HBV may be determinedaccording to the following table: Nucleotide position Genotype BGenotype C1 Genotype C2 Genotype C3 2783 A G G G 2733 A C C 3033 A CThus, for example, genotype C3 may be identified by detectingnucleotides 2783G, 2733C and 3033C at a position corresponding topositions 2783, 2733, and 3033, respectively in SEQ ID NO:1.

Detection of the nucleotides associated with a particular genotype maybe detected by any method useful for detecting nucleotide sequences,including all of those described herein (e.g., amplification, nucleotidesequencing and/or probes, etc.).

IV. Comparing Nucleotides of HBV with Nucleotides Associated with HCC

Nucleotide sequence information regarding an isolate from an individualmay be compared to nucleotides associated with HCC by any method.

Where a nucleotide sequence of the isolate is determined, the sequencemay be aligned with SEQ ID NO: 1 or another HBV genomic sequence todetermine the position of the specific nucleotides of interest. Methodsof alignment of sequences for comparison are well-known in the art.Optimal alignment of sequences for comparison can be conducted, e.g., bythe local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482(1981), by the homology alignment algorithm of Needleman & Wunsch, J.Mol. Biol. 48:443 (1970), by the search for similarity method of Pearson& Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerizedimplementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA inthe Wisconsin Genetics Software Package, Genetics Computer Group, 575Science Dr., Madison, Wis.), or by manual alignment and visualinspection (see, e.g., Current Protocols in Molecular Biology (Ausubelet al., eds. 1995 supplement)).

An example of algorithm that is suitable for aligning sequences anddetermining percent sequence identity and sequence similarity are theBLAST and BLAST 2.0 algorithms, which are described in Altschul et al.,Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et al., J. Mol. Biol.215:403-410 (1990), respectively. BLAST and BLAST 2.0 may be used, withthe parameters described herein, to determine an optimal alignment.Software for performing BLAST analyses is publicly available through theNational Center for Biotechnology Information(http://www.ncbi.nlm.nih.gov/). This algorithm involves firstidentifying high scoring sequence pairs (HSPs) by identifying shortwords of length W in the query sequence, which either match or satisfysome positive-valued threshold score T when aligned with a word of thesame length in a database sequence. T is referred to as the neighborhoodword score threshold (Altschul et al., supra). These initialneighborhood word hits act as seeds for initiating searches to findlonger HSPs containing them. The word hits are extended in bothdirections along each sequence for as far as the cumulative alignmentscore can be increased. Cumulative scores are calculated using, fornucleotide sequences, the parameters M (reward score for a pair ofmatching residues; always >0) and N (penalty score for mismatchingresidues; always <0). For amino acid sequences, a scoring matrix is usedto calculate the cumulative score. Extension of the word hits in eachdirection are halted when: the cumulative alignment score falls off bythe quantity X from its maximum achieved value; the cumulative scoregoes to zero or below, due to the accumulation of one or morenegative-scoring residue alignments; or the end of either sequence isreached. The BLAST algorithm parameters W, T, and X determine thesensitivity and speed of the alignment. The BLASTN program (fornucleotide sequences) uses as defaults a wordlength (W) of 11, anexpectation (E) of 10, M=5, N=−4 and a comparison of both strands and itis generally useful to turn off the complexity filter.

Positions of nucleotides of interest are provided throughout thisapplication with reference to the first C of the first EcoR₁ cleavagesite (GAACTCC) that generally occur in the HBV genome. The first “C” isposition 1 of SEQ ID NO: 1. Thus, following alignment of a sequence ofinterest with SEQ ID NO: 1, a particular nucleotide of the sequence ofinterest may be assigned a position relative to the correspondingposition in the alignment with SEQ ID NO: 1.

The presence of any of the following nucleotides is indicative of apre-disposition for HCC: 31C, 53C, 799G, 1165T, 1499G, 1762T, 1764A,2170C, 2170G, 2441C, 2525C, 2712C, 2712A, or 2712G. While those of skillin the art will recognize that any number of algorithms may be usefulfor predicting a predisposition for developing HCC, as described in theExample, particularly good sensitivity and specificity may be obtainedusing the following algorithm:

For genotype B HBV, the presence of:

-   -   1762T and 1764A and 2712A; or    -   1762T and 1764A and 2712C or;    -   1762T and 1764A and 2712G; or    -   1762T and 1764A and 2712T and 2525C; or    -   1762A and 1764G and 1165T, indicates a pre-disposition for HCC.

For genotype C1 HBV, the presence of:

-   -   31C; and/or    -   53C; and/or    -   1499G, indicates a pre-disposition for HCC.

For genotype C2 HBV, the presence of:

-   -   2170C; and/or    -   2170G; and/or    -   2441C; and/or    -   799G, indicates a pre-disposition for HCC.

In some embodiments of the invention, it is useful to apply theabove-listed algorithms in a computer readable form. The code forperforming any of the functions described herein can be executed by thedigital computers and may be stored on any suitable computer readablemedia. Examples of computer readable media include magnetic, electronic,or optical disks, tapes, sticks, chips, etc. The code for performing anyof the functions described herein may also be written in any suitablecomputer programming language including, for example, Fortran, C, C++,etc. The graphical user interfaces and functions underlying thegraphical user interfaces can be created using an object orientedprogramming language such as Java.

V. Benefits of Identifying Individuals Pre-Disposed for HCC

The conventional methods of surveillance for HCC are testing an infectedperson's serum alfa-fetoprotein levels (see, e.g., Liaw Y F et al.,Gastroenterology, 30:263-267 (1986); Colombo M. et al., N. Engl. J.Med., 325:675-680 (1991); Oka H. et al., Hepatology 12:680-687 (1990) orby subjecting the person to abdominal ultrasound scanning. Anothermethod for diagnosis of HCC is detecting des-gamma-carboxy prothrombin(Chan C Y et al. J Hepatol. 13:21-24 (1991); Weitz I C et al.,Hepatology 18:990-997 (1993)). Another marker for HCC is TGF-1β. See,e.g., US Patent Publication No. 2004/0121414.

However, without information regarding which patients may bepre-disposed for HCC, it is necessary to screen every person infectedwith HBV on a regular basis to catch HCC as early as possible.Unfortunately, given the large number of people infected with HBV, aswell as the finite resources available to screen individuals, it isimpossible to perform all of the necessary screens. The presentinvention addresses this problem, by indicating which individuals shouldhave intense surveillance for the initial signs of HCC and whichindividuals do not require such intense surveillance. Thus, the presentinvention provides for detecting those individuals carrying HBV that ispre-disposed to cause HCC and then further testing those individuals ona regular basis for the presence of HCC and optionally, only rarely ornever testing those individuals with HCC-associated HBV variants.

VI. Kits

Kits comprising the components needed in the methods (typically in anunmixed form) and kit components (packaging materials, instructions forusing the components and/or the methods, one or more containers(reaction tubes, columns, etc.)) for holding the components are afeature of the present invention. Kits of the present invention maycontain reagents for detecting any one or more of the followingnucleotide variants in an HBV genome: 31C, 53C, 799G, 1165T, 1499G,1762T, 1764A, 2170C, 2170G, 2441C, 2525C, 2712C, 2712A, and/or 2712G.For example, the kits of the invention may comprise combinations ofprimers and/or probes as described herein for the detection ofnucleotide variants associated with HCC. Optionally, the kits maycontain reagents for amplification, including but not limited to,thermostable polymerases such as Taq polymerase, nucleotides, buffers,etc.

EXAMPLE

Our goal was to discover genetic markers of HCC cases from HBV DNAsequences. In other words, we built up a classification model based onHBV DNA to predict cancer. Several classification models including NaiveBayes, Decision Tree, Neural Networks, and Rule Learning UsingEvolutionary Algorithm, have been applied to classify the DNA datasets.The experimental results showed that the Rule Learning UsingEvolutionary Algorithm has the best performance. In this section, wepresent the results of applying the Rule Learning Using EvolutionaryAlgorithm to classify the HBV DNA data in liver cancer (HCC) and normalcases.

Experimental Methodology

For each experiment, 90% of samples are selected randomly as thetraining set and the remains 10% samples form the testing set. For eachdataset, the experiment is repeated for 10 times.

In medical diagnosis and disease predication problems, the algorithm ormodel performance is not only judged by accuracy, but also sensitivityand specificity. Sensitivity is generally more important thanspecificity and accuracy in medical diagnoses because doctors andpatients prefer not to miss any patients with diseases. Extra diagnosisand tests can be performed to confirm their prediction and removeinitial false positives.

We evaluated our model in all these three measurements.$\text{Accuracy} = \frac{\text{True~~Positive} + \text{True~~Negative}}{\text{True~~Positive} + \text{True~~Negative}}$$\text{Sensitivity} = \frac{\text{True~~Positive}}{\text{True~~Positive} + \text{False~~Negative}}$$\text{Specificity} = \frac{\text{True~~Negative}}{\text{True~~Negative} + \text{False~~Positive}}$

The true positive is the number of all the patients with the disease anda positive test result, whereas the true negative is the number of allthe patients without the disease and a negative test result. The falsepositive is the number of all the patients without the disease but apositive test result, whereas the false negative is the number of allthe patients with the disease but a negative test result. In medicaldiagnosis, a false negative is the most undesirable case.

Results

Data Description

Genotype B and genotype C data were separated for analysis. Theproportion of patients in each genotype or C subtypes is shown in Table2. “CON” refers to “control,” i.e., no HCC. TABLE 2 Datasets CON HCCTotal % B 49 37 86 43.8776 C1 10 16 26 13.2653 C2 18 22 40 20.4082 C3 1925 44 22.4490 Total 96 100 196 100Genotype B

Table 3 shows the details of the markers for HBV genotype B. TABLE 3 HBVmarkers for HCC of genotype B Markers Normal value HCC-related value1762, 1764 AG TA 1165 C T 2712 T C(A,G) 2525 A,T C

The classification rules based on the applied data cleansing process forgenotype B are as follows:

If 1762A and 1764G and 1165T are present in genotype B, then HCC islikely to occur.

If 1762T and 1764A and 2712A, 2712C or 2712G are present in genotype B,then HCC is likely to occur.

If 1762T and 1764A and 2712T and 2525C are present in genotype B, thenHCC is likely to occur.

The experimental results for the genotype B dataset are shown in Table4. TABLE 4 Results of genotype B HBV dataset to predict HCC ResultsTraining set STD Testing set STD Sensitivity 0.75029 (0.05361) 0.75(0.16667) Specificity 0.68 (0.06215) 0.66 (0.13499) Accuracy 0.7093(0.02615) 0.70 (0.07499)C1 Subgroup

Table 5 shows the details of the markers for C1 subgroup. TABLE 5 HCCrelated markers for C1 subgroup Markers Normal value HCC-related value31 T C 53 T C 1499 A G

The classification rules based on the applied data cleansing process forC1 subgroup are as follows:

If 31C or 53C or 1499G are present in genotype C1, then HCC is likely tooccur.

The Experimental results for the C1 subgroup are showed in Table 6.TABLE 6 Results of genotype C1 HBV dataset to predict HCC ResultsTraining set STD Testing set STD Sensitivity 0.80769 (0.04054) 0.75(0.26252) Specificity 0.7875 (0.06038) 0.7 (0.48305) Accuracy 0.8(0.03012) 0.7333 (0.21082)C2 Subgroup

Table 7 shows the details of the markers for C2 subgroup. TABLE 7 HCCrelated markers for C2 subgroup Markers Normal value HCC-related value2170 T C,G 2441 T C 799 A G

The classification rules based on the applied data cleansing process forC2 subgroup are as follows:

If 2170C or 2170G or 2441C or 799G re present in genotype C2, then HCCis likely to occur.

The Experimental results on the C2 subgroup are showed in Table 8. TABLE8 Results of C2 genotype dataset to predict HCC Results Training set(STD) Testing set (STD) Sensitivity 0.84706 (0.06323) 0.85 (0.24152)Specificity 0.97857 (0.0345) 1 (0.00000) Accuracy 0.90645 (0.0355) 0.925(0.12076)Patients and MethodsPatients

Residual serum samples of one hundred chronic hepatitis B patientssuffering from hepatocellular carcinoma (HCC) and one hundredage-matched control patients who had chronic hepatitis B but withouthepatocellular carcinoma were studied. Consecutive patients withconfirmed diagnosis of HCC who had positive HBsAg attending the JointHepatoma Clinic, Prince of Wales Hospital from July 1999 to 2001 wereincluded. Confirmed diagnosis of HCC is defined by either histology orradiological evidence of a hepatic mass with a serum alpha-fetoprotein(AFP) of 500 μg/l or more. Patients who had positive anti-HCV or historyof alcoholism were excluded. Informed consent to provide serum samplefor experimental study were routinely obtained from patients in JointHepatoma Clinic. Relevant clinical information of enrolled patients wascollected retrospectively.

Age-matched control patients were identified from the cohort of chronichepatitis B patients prospectively follow-up in the Hepatitis Clinicsince December 1997. Patients who had other possible causes of hepatitisor liver cirrhosis including autoimmune liver disease, primary biliarycirrhosis, Wilson's disease and hemochromatosis were also excluded. Atinitial presentation, abdomen ultrasounds were performed to exclude anypre-existing HCC. Patients were prospectively followed up every 6monthly, or more frequently if clinically indicated, with monitoring ofliver biochemistry, HBeAg and anti-HBe status as well asalfa-fetoprotein levels. Abdomenal ultrasounds, computerized tomography,hepatic angiogram and/or liver biopsy were performed wheneveralfa-fetoprotein level was higher than 50 μg/l or on a rising trend over20 μg/l to confirm the diagnosis of HCC. For patients with normalalfa-fetoprotein levels, ultrasound abdomen was performed every 1-2yearly.

Laboratory Method

Extraction Of DNA

Serum viral DNA was extracted using QIAamp DNA Blood Mini Kit (Qiagen,Calif., USA) according to the manufacturer's instructions.

Amplification of HBV DNA

To obtain the full-length HBV DNA sequence, a long distance semi-nestedPCR was performed to amplify three overlapping fragments (A, B and C).Relative positions of these PCR fragments to the map of HBV genome areshown in FIG. 1 and the nucleotide sequences of the PCR primers can befound in Table 1. TABLE 1 The sequences of primers used for amplifyingand sequencing the HBV DNA Nucleotide sequence Nt Name (5′ → 3′)positions Direction Primers used for PCR P1 TTTTTCACCTCTGCCTAATCA1821-1841 sense P2 CCCTAGAAAATTGAGAGAAGTC 262-283 antisense P3^(a)CCACTGCATGGCCTGAGGATG 3193-3213 antisense P4 GCCTCATTTTGTGGGTCACCATA2801-2824 sense P5 TTCTTTGACATACTTTCCA 979-997 antisense P6^(a)TTGGGGTGGAGCCCTCAGGCT 3070-3090 sense P7^(a) TTGGCCAAAATTCGCAGTC 300-318sense P8^(a) CCCCACTGTTTGGCTTTCAG 714-734 sense P9^(a)GTTGATAAGATAGGGGCATTTGGTGG 2299-2325 antisense Primers used forsequencing S1 CTCCGGAACATTGTTCACCT 2031-2050 sense S2AAGGTGGGAAACTTTACTGGGC 2469-2490 sense S3 GCTGACGCAACCCCCACTGG 1186-1205sense S4 TCGCATGGAGACCACCGTGA 1604-1623 sense S5 GGCAAAAACGAGAGTAACTC1940-1959 antisense S6 GGGTCGTCCGCGGGATTCAG 1441-1460 antisense S7GACATACTTTCCAATCAATAGG 970-991 antisense S8 GAAGATGAGGCATAGCAGCAGG411-433 antisense S9 CATGCTGTAGCTCTTGTTCC 2831-2850 antisense^(a)These primers were also used for sequencingFragment A

When amplifying fragment A, 5 μl of the extracted DNA was subjected toPCR in the presence of 50 mM KCl, 1.5 mM MgCl₂, 10 mM Tris-HCl, 200 μMof each dNTP, 1.25 units Taq DNA polymerase (Amersham Biosciences), 1.5units pfu DNA polymerase (Promega), and 10 pmol of each P1 primer and P2primer in a final volume of 50 μl. PCR was carried out under a 5-mininitial denaturation at 95° C., followed by 10 cycles of amplification(94° C., 36 sec; 60° C., 36 sec; 72° C., 2.5 min) and then 30 cycles ofamplification (94° C., 36 sec; 50° C., 36 sec; 72° C., 2.5 min) and7-min final extension at 72° C.

The PCR product was further amplified in a semi-nested PCR. Onemicroliter of the product was subjected to PCR in the presence of 50 mMKCl, 1.5 mM MgCl₂, 10 mM Tris-HCl, 200 μM of each dNTP, 2.5 units TaqDNA polymerase (Amersham Biosciences) and 10 pmol of each P1 primer andP3 primer in a final volume of 50 μl. PCR was carried out under a 5-mininitial denaturation at 95° C., followed by 10 cycles of amplification(94° C., 36 sec; 60° C., 36 sec; 72° C., 2 min) and then 30 cycles ofamplification (94° C., 36 sec; 52° C., 36 sec; 72° C., 2 min) and a7-min final extension at 72° C. Finally, quality and quantity of the PCRproduct was examined on a 1.0% agarose/EtBr gel run in 1×TBE buffer.

Fragment B

When amplifying fragment B, 5 μl of the extracted DNA was subjected toPCR in the presence of 50 mM KCl, 1.5 mM MgCl₂, 10 mM Tris-HCl, 200 μMof each dNTP, 1.25 units Taq DNA polymerase (Amersham Biosciences), 1.5units pfu DNA polymerase (Promega), and 10 pmol of each P4 primer and P5primer in a final volume of 50 μl. PCR was carried out under a 5-mininitial denaturation at 95° C., followed by 10 cycles of amplification(94° C., 36 sec; 60° C., 36 sec; 72° C., 90 sec) and then 30 cycles ofamplification (94° C., 36 sec; 50° C., 36 sec; 72° C., 90 sec) and a7-min final extension at 72° C.

The PCR product was further amplified in a semi-nested PCR. Onemicroliter of the product was subjected to PCR in the presence of 50 mMKCl, 1.5 mM MgCl₂, 10 mM Tris-HCl, 200 μM of each dNTP, 2.5 units TaqDNA polymerase (Amersham Biosciences) and 10 pmol of each P5 primer andP6 primer in a final volume of 50 μl. PCR was carried out under a 5-mininitial denaturation at 95° C., followed by 10 cycles of amplification(94° C., 36 sec; 60° C., 36 sec; 72° C., 90 sec) and then 30 cycles ofamplification (94° C., 36 sec; 52° C., 36 sec; 72° C., 90 sec) and a7-min final extension at 72° C. Finally, quality and quantity of the PCRproduct was examined on a 1.0% agarose/EtBr gel run in 1×TBE buffer.

Fragment C

When amplifying fragment C, 5 μl of the extracted DNA was subjected toPCR in the presence of 50 mM KCl, 1.5 mM MgCl₂, 10 mM Tris-HCl, 200 μMof each dNTP, 1.25 units Taq DNA polymerase (Amersham Biosciences), 1.5units pfu DNA polymerase (Promega), and 10 pmol of each P7 primer and P9primer in a final volume of 50 μl. PCR was carried out under a 5-mininitial denaturation at 95° C., followed by 10 cycles of amplification(94° C., 36 sec; 60° C., 36 sec; 72° C., 2 min and 15 sec) and then 30cycles of amplification (94° C., 36 sec; 50° C., 36 sec; 72° C., 2 minand 15 sec) and a 7-min final extension at 72° C.

The PCR product was further amplified in a semi-nested PCR. Onemicroliter of the product was subjected to PCR in the presence of 50 mMKCl, 1.5 mM MgCl₂, 10 mM Tris-HCl, 200 μM of each dNTP, 2.5 units TaqDNA polymerase (Amersham Biosciences) and 10 pmol of each P8 primer andP9 primer in a final volume of 50 μl. PCR was carried out under a 5-mininitial denaturation at 95° C., followed by 10 cycles of amplification(94° C., 36 sec; 60° C., 36 sec; 72° C., 1 min and 50 sec) and then 30cycles of amplification (94° C., 36 sec; 52° C., 36 sec; 72° C., 1 minand 50 sec) and a 7-min final extension at 72° C. Finally, quality andquantity of the PCR product was examined on a 1.0% agarose/EtBr gel runin 1×TBE buffer.

DNA Sequencing

All semi-nested PCR products (plus and minus strands) were directlysequenced with the Cycling Sequencing Kit DYEnamic ET Dye terminator forMegaBACE (Amersham Biosciences).

Primers for the sequencing of three HBV DNA fragments (primers sequencesare listed in Table 1):

-   Fragment A: S1, S2, P3, S9-   Fragment B: P6, P7, S7, S8-   Fragment C: P8, S3, S4, P9, S5, S6

One microliter of unpurified PCR product was used as the DNA templatefor cycle sequencing. It was subjected to sequencing reaction in thepresence of 8 μl of DYEnamic ET reagent premix and 10 pmol primer in afinal volume of 20 μl. Sequencing reaction mix was subjected to a 2 mininitial denaturation at 95° C., followed by 30 cycles at 95° C., 25 sec;52° C., 30 sec; 60° C.; 60 sec.

The sequencing products were purified by post reaction clean up usingethanol precipitation. In each reaction tube, 2 μl of 7.5M ammoniumacetate and 2.5 volumes (55 μl) of 100% ethanol were added so that thefinal concentration of ethanol was 70%. Then it was subjected tocentrifugation at 4,000 rpm for 30 min at 14° C. Afterwards, thesupernatant was drawn off by performing a brief inverted spin (1 min at500 rpm). The DNA pellet was washed by 100 μl of 70% ethanol. Then, itwas subjected to centrifugation at 4,000 rpm for 15 min at 14° C. andthe supernatant was drawn off by performing a brief inverted spin (1 minat 500 rpm). Then the DNA pellet was allowed to air dry and wasresuspended in 10 μL of loading buffer (70% formamide and 1 mM EDTA).The samples were stored at 4° C. before gel electrophoresis analysisusing the MegaBACE 1000 DNA sequencer.

It is understood that the examples and embodiments described herein arefor illustrative purposes only and that various modifications or changesin light thereof will be suggested to persons skilled in the art and areto be included within the spirit and purview of this application andscope of the appended claims. All publications, patents, and patentapplications cited herein are hereby incorporated by reference in theirentirety for all purposes. TABLE 1 The sequences of primers used foramplifying and sequencing the HBV DNA Nucleotide sequence Nt Name (5′ →3′) positions Direction Primers used for PCR (SEQ ID NOS:4-12) P1TTTTTCACCTCTGCCTAATCA 1821-1841 sense P2 CCCTAGAAAATTGAGAGAAGTC 262-283antisense P3^(a) CCACTGCATGGCCTGAGGATG 3193-3213 antisense P4GCCTCATTTTGTGGGTCACCATA 2801-2824 sense P5 TTCTTTGACATACTTTCCA 979-997antisense P6^(a) TTGGGGTGGAGCCCTCAGGCT 3070-3090 sense P7^(a)TTGGCCAAAATTCGCAGTC 300-318 sense P8^(a) CCCCACTGTTTGGCTTTCAG 714-734sense P9^(a) GTTGATAAGATAGGGGCATTTGGTGG 2299-2325 antisense Primers usedfor sequencing (SEQ ID NOS:13-21) S1 CTCCGGAACATTGTTCACCT 2031-2050sense S2 AAGGTGGGAAACTTTACTGGGC 2469-2490 sense S3 GCTGACGCAACCCCCACTGG1186-1205 sense S4 TCGCATGGAGACCACCGTGA 1604-1623 sense S5GGCAAAAACGAGAGTAACTC 1940-1959 antisense S6 GGGTCGTCCGCGGGATTCAG1441-1460 antisense S7 GACATACTTTCCAATCAATAGG 970-991 antisense S8GAAGATGAGGCATAGCAGCAGG 411-433 antisense S9 CATGCTGTAGCTCTTGTTCC2831-2850 antisense^(a)These primers were also used for sequencing.

1. A method of determining a pre-disposition of an individual infectedwith hepatitis B virus (HBV) to develop hepatocellular carcinoma (HCC),the method comprising: determining nucleotides in the genome of HBVisolated from the individual at positions corresponding to nucleotides31, 53, 799, 1165, 1499, 1762, 1764, 2170, 2441, 2525, and/or 2712 ofSEQ ID NO:1; and comparing the determined nucleotides to nucleotidesassociated with a pre-disposition to cause HCC, wherein the nucleotidesassociated with a pre-disposition to cause HCC comprise: 31C, 53C, 799G,1165T, 1499G, 1762T, 1764A, 2170C, 2170G, 2441C, 2525C, 2712C, 2712A,and/or 2712G.
 2. The method of claim 1, the method comprisingdetermining nucleotides in the genome of a genotype B HBV isolated fromthe individual at positions corresponding to nucleotides 1165, 1762,1764, 2525 or 2712 of SEQ ID NO:1; and comparing the determinednucleotides to nucleotides associated with a pre-disposition to causeHCC, wherein the nucleotides associated with a pre-disposition to causeHCC comprise: 1165T, 1762T, 1764A, 2525C, 2712C, 2712A, or 2712G.
 3. Themethod of claim 3, the method comprising determining nucleotides in thegenome of a genotype B HBV isolated from the individual at positionscorresponding to nucleotides 1165, 1762, 1764, 2525 and 2712 of SEQ IDNO:1; and comparing the determined nucleotides to nucleotides associatedwith a pre-disposition to cause HCC, wherein the nucleotides associatedwith a pre-disposition to cause HCC in genotype B comprise: 1762T and1764A and 2712A; or 1762T and 1764A and 2712C; or 1762T and 1764A and2712G; or 1762T and 1764A and 2712T and 2525C; or 1762A and 1764G and1165T.
 4. The method of claim 3, wherein the determining step comprisesnucleotide sequencing the HBV genome flanking the nucleotides atpositions corresponding to nucleotides 1165, 1762, 1764, 2525 and 2712of SEQ ID NO:
 1. 5. The method of claim 3, wherein the determining stepcomprises amplifying at least a portion of the HBV genome to produce oneor more amplification products comprising the nucleotides at thepositions corresponding to nucleotides 1165, 1762, 1764, 2525 and 2712of SEQ ID NO:1.
 6. The method of claim 5, comprising contacting the oneor more amplification products with one or more probes that hybridize toHCC-associated nucleotides: 1762T and 1764A and 2712A; or 1762T and1764A and 2712C or; 1762T and 1764A and 2712G; or 1762T and 1764A and2712T and 2525C; or 1762A and 1764G and 1165T; under conditions to allowfor hybridization of a probe to an amplification product only if theamplification product comprises a complementary nucleotide at theposition of the HCC-associated nucleotide.
 7. The method of claim 6,wherein the hybridization is performed as a line probe assay.
 8. Themethod of claim 3, further comprising determining the genotype of theHBV from the individual.
 9. The method of claim 1, the method comprisingdetermining nucleotides in the genome of a genotype C HBV isolated fromthe individual at positions corresponding to nucleotides 31, 53, 799,1499, 2170, or 2441; and comparing the determined nucleotides tonucleotides associated with a pre-disposition to cause HCC, wherein thenucleotides associated with a pre-disposition to cause HCC comprise:31C, 53C, 799G, 1499G, 2170C, 2170G, or 2441C.
 10. The method of claim9, the method comprising a) determining the subtype of a genotype C HBVfrom the individual, wherein: subtype C1 comprises nucleotides 2783G and2733A, subtype C2 comprises nucleotides 2783G, 2733C and 3033A, andsubtype C3 comprises 2783G, 2733C and 3033C; b1) if the HBV is genotypeC1, determining the nucleotides at positions corresponding tonucleotides 31, 53 and 1499 of SEQ ID NO: 1; or b2) if the HBV isgenotype C2, determining the nucleotides at positions corresponding tonucleotides 799, 2441 and 2170 of SEQ ID NO: 1; and c) comparing thedetermined nucleotides to nucleotides at the positions associated with apre-disposition to cause HCC, wherein the nucleotides associated with apre-disposition to cause HCC in subtype C1 comprise: 31C; and/or 53C;and/or 1499G; and the nucleotides associated with a pre-disposition tocause HCC in subtype C2 comprise: 2170C; and/or 2170G; and/or 2441C;and/or 799G.
 11. The method of claim 10, wherein the determining stepcomprises nucleotide sequencing the HBV genome flanking the nucleotidesat positions corresponding to nucleotides 31, 53, and 1499 of SEQ IDNO:1.
 12. The method of claim 10, wherein the determining step comprisesnucleotide sequencing the HBV genome flanking the nucleotides atpositions corresponding to nucleotides 799, 2441, and 2170 of SEQ IDNO:1.
 13. The method of claim 10, wherein the determining step comprisesamplifying at least a portion of the HBV genome to produce one or moreamplification products comprising the nucleotides at the positionscorresponding to nucleotides 31, 53, and 1499 of SEQ ID NO:1.
 14. Themethod of claim 10, wherein the determining step comprises amplifying atleast a portion of the HBV genome to produce one or more amplificationproducts comprising the nucleotides at the positions corresponding tonucleotides 799, 2441, and 2170 of SEQ ID NO:1.
 15. The method of claim13, comprising contacting the one or more amplification products withone or more probes that hybridize to HCC-associated nucleotides: 31C;and/or 53C; and/or 1499G; under conditions to allow for hybridization ofa probe to an amplification product only if the amplification productcomprises a complementary nucleotide at the position of theHCC-associated nucleotide.
 16. The method of claim 15, wherein thehybridization is performed as a line probe assay.
 17. The method ofclaim 13, comprising contacting the one or more amplification productswith probes that hybridize to HCC-associated nucleotides: 2170C; and/or2170G; and/or 2441C; and/or 799G; under conditions to allow forhybridization of the probes to the amplification product only if theamplification product comprises a complementary nucleotide at theposition of the HCC-associated nucleotide.
 18. The method of claim 17,wherein the hybridization is performed as a line assay.
 19. The methodof claim 10, further comprising determining the genotype of the HBV fromthe individual.
 20. The method of claim 1, the method comprisingdetermining the genotype of the HBV, wherein genotype B comprises 2783A,wherein genotype C1 comprises 2783G and 2733A, genotype C2 comprises2783G, 2733C and 3033A and genotype C3 comprises 2783G, 2733C and 3033C;determining nucleotides 1165, 1762, 1764, 2525 and 2712 of the HBVgenome if the HBV is genotype B; and/or determining nucleotides 31and/or 53 and/or 1499 of the HBV genome if the HBV is C1; and/ordetermining nucleotides 2170 and/or 2441 and/or 799 of the HBV genome ifthe HBV is C2; and comparing the determined nucleotides to nucleotidesassociated with a pre-disposition to cause HCC, wherein nucleotidesassociated with a pre-disposition to cause HCC in genotype B comprise:1762T and 1764A and 2712A; or 1762T and 1764A and 2712C or; 1762T and1764A and 2712G; or 1762T and 1764A and 2712T and 2525C; or 1762A and1764G and 1165T; wherein nucleotides associated with a pre-dispositionto cause HCC in genotype C1 comprise: 31C; and/or 53C; and/or 1499G; andwherein nucleotides associated with a pre-disposition to cause HCC ingenotype C2 comprise: 2170C; and/or 2170G; and/or 2441C; and/or 799G;thereby determining the pre-disposition of the individual to developHCC.
 21. A kit for detecting HBV isolates that are associated with thedevelopment hepatocellular carcinoma (HCC), comprising one or more probewhich, when contacted to an HBV genome, selectively hybridizes to thegenome if the genome comprises at least one of the followingnucleotides: 31C, 53C, 799G, 1165T, 1499G, 1762T, 1762A, 1764A, 1764G,2441C, 2170C, 2170G, 2712A; 2712C, 2712G; or 2525C.
 22. The kit of claim21, wherein the probe is linked to a solid support.
 23. The kit of claim21, wherein the probe selectively hybridizes to: 1762T and 1764A and2712A; and/or 1762T and 1764A and 2712C; and/or; 1762T and 1764A and2712G; and/or 1762T and 1764A and 2712T and 2525C; and/or 1762A and1764G and 1165T.
 24. The kit of claim 21, wherein the probe selectivelyhybridizes to: 31C; and 53C; and 1499G.
 25. The kit of claim 21, whereinthe probe selectively hybridizes to: 2170C; and/or 2170G; and/or 2441C;and/or 799G.
 26. The kit of claim 21, further comprising primers foramplification of at least a portion of the HBV genome.
 27. A computerreadable medium comprising, a) code for receiving informationdescribing: nucleotides at positions corresponding to nucleotides 31,53, 799, 1165, 1499, 1762, 1764, 2170, 2441, 2525, or 2712 of SEQ IDNO:1; b) code for comparing the nucleotides received in a) tonucleotides associated with a pre-disposition to cause HCC; and c) codefor providing a determination of the pre-disposition of the HBV to causeHCC, wherein nucleotides associated with a pre-disposition to cause HCCcomprise: 31C, 53C, 799G, 1165T, 1499G, 1762T, 1764A, 2170C, 2170G,2441C, 2525C, 2712C, 2712A, or 2712G.